Feat: sno leader election config by eggfoobar · Pull Request #1242 · openshift/library-go

eggfoobar · 2021-11-03T20:32:05Z

Leader election in SNO clusters have limited resources where requests can put un-needed pressure on the kube-apiserver. This PR increases the retry period and lease duration on leader election and exposes some helper functions to determine topology and SNO leader election defaults, on operator start up.

Signed-off-by: ehila <ehila@redhat.com>

eggfoobar · 2021-11-08T00:21:04Z

To give a bit more context. Some of the numbers I was seeing are pasted below along with the queries ran against prometheus. The methodology was to create two 4.9 SNO-DU clusters (leader-election-new and leader-election-oc), both had CVO scaled down, and the only modification was that leader-election-new was running 11 operators that were rebuilt with the modified library-go running with the new leader-election numbers. I was seeing some small reductions, so I figured if we extrapolate that to the rest of the core operators we would see better gains in SNO-DU clusters, where we have very limited resources (currently management workloads are pinned to 2 cores).

Things I tried to verify was the downtime support for APIServer in SNO clusters described in this bz, rolling out APIServer didn't cause any restarts on the operators. Similarly rolling out the operator didn't cause any restarts and the operands behaved correctly with out any restarts as well.

Queries Ran

95th Quantile Over Time (CPU Usage) - API Server

Unchanged	Changed	Delta
478.90296182259607	462.6560489574072	`-3.39`

quantile_over_time(
  .95,
  sum(
    rate(container_cpu_usage_seconds_total{
      container!="POD",
      pod=~"kube-apiserver-leader-election-oc-49|kube-apiserver-leader-election-new-49",
      namespace="openshift-kube-apiserver"}[5m]
    )
  )[6h:]
) * 1000

Average Over Time (CPU Usage) - API Server

Unchanged	Changed	Delta
429.08792083196755	395.3039245479713	`-7.87`

avg_over_time(
  sum(
    rate(container_cpu_usage_seconds_total{
      container!="POD",
      pod=~"kube-apiserver-leader-election-oc-49|kube-apiserver-leader-election-new-49",
      namespace="openshift-kube-apiserver"}[5m]
    )
  )[6h:]
) * 1000

95th Quantile Over Time (CPU Usage) - Operators

Unchanged	Changed	Delta
192.51093951592614	174.9938822722223	`-9.10`

quantile_over_time(
  .95,
  sum(
    rate(container_cpu_usage_seconds_total{
      container!="POD",
      namespace=~"openshift-apiserver-operator|openshift-authentication-operator|openshift-cluster-storage-operator|openshift-console-operator|openshift-controller-manager-operator|openshift-etcd-operator|openshift-kube-apiserver-operator|openshift-kube-controller-manager-operator|openshift-kube-scheduler-operator|openshift-kube-storage-version-migrator-operator|openshift-service-ca-operator"
      }[5m]
    )
  )[6h:]
) * 1000

Average Over Time (CPU Usage) - Operators

Unchanged	Changed	Delta
173.11949276437636	159.50446507412303	`-7.86`

avg_over_time(
  sum(
    rate(container_cpu_usage_seconds_total{
      container!="POD",
      namespace=~"openshift-apiserver-operator|openshift-authentication-operator|openshift-cluster-storage-operator|openshift-console-operator|openshift-controller-manager-operator|openshift-etcd-operator|openshift-kube-apiserver-operator|openshift-kube-controller-manager-operator|openshift-kube-scheduler-operator|openshift-kube-storage-version-migrator-operator|openshift-service-ca-operator"
      }[5m]
    )
  )[6h:]
) * 1000

deads2k · 2021-11-08T16:58:22Z

Summarizing the overall benefit to an SNO cluster over X timeframe will be helpful for weighing the tradeoff of slightly lower CPU cost at the expense of latency of lease reacquisition.

eggfoobar · 2021-11-09T15:14:41Z

Sure thing, a quick summary with some new numbers. I ran the clusters under load. I used kube-burner to spin up some traffic against the API Server and just have some workloads in the cluster. The latest numbers show a pretty consistent picture of a small reduction in CPU consumption by the Changed Operators. On average by increasing leader election to 60s I'm seeing a reduction of 1-2 millicores per operator, extrapolating that to the 30 operators we run we could see 30millicore reduction on on average on the low end and 60millicore on the high end. That is a very small reduction but the gain is meaningful in environments where there is significant reduction in resources. Some DU configurations for example are pinning management workloads to 2 cores. This gives them very little breathing room and any measurable reduction is useful to provide more resources for Crio and Kubelet to operate. This does mean we reduce the responsiveness of our operators during rollouts, but the gain would mean that if need be we can support another operator in those limited environments.

In short, with the current numbers, we would have a worst non-graceful case of 330s before the new operator takes leader and a best graceful case of 60s which would mean a latency in lease reacquisition. The upside is that this would afford resource starved deployments some more breathing room in SNO environments (all be it small), which would offer Kubelet and Crio more room or potentially allow running another operator.

As always with things of this nature, another pair of eyes would be fantastic on the numbers, queries and the methodology. Any input is greatly appreciated.

hexfusion · 2021-11-09T16:17:03Z

+// This method should only be called when running in an SNO Cluster.
+func LeaderElectionSNOConfig(config configv1.LeaderElection) configv1.LeaderElection {
+
+	// We want to make sure we respect a 30s clock skew as well as a 4 retry attempt with out making


Can you provide color into the 30s skew and why this duration was chosen? In the case of SNO where we are essentially talking about a single clock on the control plane, it's not directly apparent where the skew would come from in regards to a lease.

That is a great call out, the idea there was to keep that mostly intact as it's not clear to me if there will be SNO clusters that might have a worker machine and what that arrangement might be in terms of where operators might fall. Figured to err on the side of caution and keep with the set convention.

soltysh

One nit, otherwise this lgtm

soltysh · 2021-11-22T12:25:02Z

+	} else if infraStatus.ControlPlaneTopology == configv1.SingleReplicaTopologyMode {
+		snoLeaderElection := leaderelectionconverter.LeaderElectionSNOConfig(*b.leaderElection)
+		b.leaderElection = &snoLeaderElection
+	}


Shouldn't that block move to WithleaderElection since it's possible that LE is disabled and this code can accidentally turn it on if we're not careful enough.

That's a good point, the reason why it's there is because we won't know the topology we're in until runtime, the function currently won't alter the disable flag in LE, but to make it more robust I changed the if statement to be if topology = SingleReplica AND LE is not disabled as well as some tests to ensure behavior, do you think that's a good enough safeguard against accidentally turning it on?

This code is still in a spot that does not have access to the user's intent. @soltysh's comment looks unaddressed to me.

EDIT: I see the value being saved in WithleaderElection

eggfoobar · 2021-11-23T19:41:13Z

/test unit

eggfoobar · 2021-11-23T21:13:58Z

/test unit

soltysh

/lgtm
deferring to @deads2k for approval

deads2k · 2021-11-30T15:42:59Z

+		klog.Warningf("unable to get cluster infrastructure status for leader election (falling back to default): %v", err)
+	} else if infraStatus.ControlPlaneTopology == configv1.SingleReplicaTopologyMode && !b.leaderElection.Disable {
+		snoLeaderElection := leaderelectionconverter.LeaderElectionSNOConfig(*b.leaderElection)
+		b.leaderElection = &snoLeaderElection


this is stomping the code configured leader election if someone specified their actual values. Only in WithleaderElection do you have sufficient information to know what the user provided so you don't overwrite it.

How you resolve this is up to you/@soltysh, but you cannot unconditionally stomp the user's intent.

I altered this a bit, I added a flag approach to keep it straight forward and introduce less things. The idea being if the user provides no timing configs in the WithleaderElection then it's safe to assume we can change those configs, so we fetch the infraStatus to see if we should alter the timings for SNO. @soltysh What do you think about this approach?

How you resolve this is up to you/@soltysh, but you cannot unconditionally stomp the user's intent.

@deads2k that was also my suggestion here: https://github.com/openshift/library-go/pull/1242/files#r754222946

With the recent change @soltysh , I'm using the 0 values in the timings to ensure that if the user has decided to have some customizations in the timings then we won't alter the timings during runtime. That should make sure we don't stomp on the users intent, and if no values were provided, then we can query the cluster and change the values if SNO is detected. That's going to be triggered here

deads2k · 2021-12-08T20:40:03Z

/approve

please squash before merge.

added functions to help default leader election config for sno clusters updated controller builder to use sno leader election when user provdes no options and we are currently running in sno control plane topology Signed-off-by: ehila <ehila@redhat.com> feat: added cluster infra status check for leader election added check for sno topology for leader election config Signed-off-by: ehila <ehila@redhat.com> upkeep: updated warning message for clarity Signed-off-by: ehila <ehila@redhat.com> docs: added calculations and context added calculations for the sno leader election numbers added more comments around method to provide context and reasoning Signed-off-by: ehila <ehila@redhat.com> feat: added LE disable check and tests added check for LE disable flag and SingleReplica before calling SNO leader election method added tests for LeaderElectionSNOConfig and LeaderElectionDefaulting for expected behavior Signed-off-by: ehila <ehila@redhat.com> test: fixed test case for defaults Signed-off-by: ehila <ehila@redhat.com> test: changed unit test to use reflect package changed unit test for equaility check to use reflect.DeepEqual instead of assert.Equal to follow convention of other unit tests Signed-off-by: ehila <ehila@redhat.com> test: fix unit test when running in cluster fixed unit test that reads in namespace from cluster as fallback Signed-off-by: ehila <ehila@redhat.com> fix: added nil check for infraStatus and error out if not populated Signed-off-by: ehila <ehila@redhat.com> fix: added flag to derive user intent on leader election added a flag to signal if the user supplied any leader election config added logic to allow us to override the leader election if the user has provided no configs added unit tests for sno config logic method Signed-off-by: ehila <ehila@redhat.com> feat: removed un-needed method removed GetClusterInfraStatusOrDie method, not needed Signed-off-by: ehila <ehila@redhat.com> refactor: changed variable name and logging message updated flag variable name to be more explicit inverted logic to match variable name updated messaging to match for event recorder and logging statement Signed-off-by: ehila <ehila@redhat.com> feat: changed flag logic to match recommendation Signed-off-by: ehila <ehila@redhat.com>

deads2k · 2021-12-08T21:27:05Z

diff of diffs is clean https://github.com/openshift/library-go/compare/aa75fd24bdd8b2b1c798a1237bc27364a16bfac0..2612981f3019479805ac8448b997266fc07a236a

/lgtm

openshift-ci · 2021-12-08T21:29:43Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: deads2k, eggfoobar, soltysh

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [deads2k]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

feat: added helper pkg for cluster infra status

8b5f0ba

Signed-off-by: ehila <ehila@redhat.com>

openshift-ci Bot requested review from deads2k and smarterclayton November 3, 2021 20:32

deads2k reviewed Nov 4, 2021

View reviewed changes

Comment thread pkg/config/leaderelection/leaderelection.go

hexfusion reviewed Nov 9, 2021

View reviewed changes

soltysh reviewed Nov 22, 2021

View reviewed changes

soltysh approved these changes Nov 26, 2021

View reviewed changes

openshift-ci Bot assigned soltysh Nov 26, 2021

openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Nov 26, 2021

deads2k reviewed Nov 30, 2021

View reviewed changes

Comment thread pkg/controller/controllercmd/builder.go Outdated

deads2k reviewed Nov 30, 2021

View reviewed changes

openshift-ci Bot removed the lgtm Indicates that a PR is ready to be merged. label Dec 2, 2021

eggfoobar mentioned this pull request Dec 8, 2021

Library go bump openshift/console-operator#615

Merged

deads2k reviewed Dec 8, 2021

View reviewed changes

Comment thread pkg/config/clusterstatus/clusterstatus.go Outdated

deads2k reviewed Dec 8, 2021

View reviewed changes

Comment thread pkg/controller/controllercmd/builder.go Outdated

deads2k reviewed Dec 8, 2021

View reviewed changes

Comment thread pkg/controller/controllercmd/builder.go Outdated

deads2k reviewed Dec 8, 2021

View reviewed changes

Comment thread pkg/controller/controllercmd/builder.go Outdated

openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 8, 2021

eggfoobar force-pushed the feat_sno_leaderelection_config branch from aa75fd2 to 2612981 Compare December 8, 2021 21:25

openshift-ci Bot assigned deads2k Dec 8, 2021

openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Dec 8, 2021

openshift-merge-robot merged commit 9b73bdc into openshift:master Dec 8, 2021

This was referenced Dec 9, 2021

Bug 2034484: feat: library-go bump openshift/service-ca-operator#185

Merged

Bug 2034484: Upgrade library-go version openshift/cluster-network-operator#1247

Merged

This was referenced Dec 9, 2021

Bug 2034484: Library go bump openshift/cluster-image-registry-operator#736

Merged

Bug 2034484: feat: bumped library-go version openshift/cluster-config-operator#227

Merged

wking mentioned this pull request Dec 10, 2021

vendor: Bump library-go openshift/cluster-version-operator#710

Closed

eggfoobar mentioned this pull request Dec 10, 2021

feat: library-go bump openshift/cluster-authentication-operator#521

Closed

qJkee mentioned this pull request Dec 10, 2021

[WIP] Upgrade library-go version openshift/cluster-etcd-operator#715

Closed

qJkee mentioned this pull request Dec 14, 2021

Upgrade library-go version openshift/cluster-kube-storage-version-migrator-operator#80

Merged

Conversation

eggfoobar commented Nov 3, 2021

Uh oh!

Uh oh!

eggfoobar commented Nov 8, 2021

Queries Ran

95th Quantile Over Time (CPU Usage) - API Server

Average Over Time (CPU Usage) - API Server

95th Quantile Over Time (CPU Usage) - Operators

Average Over Time (CPU Usage) - Operators

Uh oh!

deads2k commented Nov 8, 2021

Uh oh!

eggfoobar commented Nov 9, 2021

Uh oh!

hexfusion Nov 9, 2021

Choose a reason for hiding this comment

Uh oh!

eggfoobar Nov 9, 2021

Choose a reason for hiding this comment

Uh oh!

soltysh left a comment

Choose a reason for hiding this comment

Uh oh!

soltysh Nov 22, 2021

Choose a reason for hiding this comment

Uh oh!

eggfoobar Nov 23, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

deads2k Dec 8, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eggfoobar commented Nov 23, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eggfoobar commented Nov 23, 2021

Uh oh!

soltysh left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

deads2k Nov 30, 2021

Choose a reason for hiding this comment

Uh oh!

eggfoobar Dec 2, 2021

Choose a reason for hiding this comment

Uh oh!

soltysh Dec 7, 2021

Choose a reason for hiding this comment

Uh oh!

eggfoobar Dec 7, 2021

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

deads2k commented Dec 8, 2021

Uh oh!

deads2k commented Dec 8, 2021

Uh oh!

openshift-ci Bot commented Dec 8, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

eggfoobar Nov 23, 2021 •

edited

Loading

deads2k Dec 8, 2021 •

edited

Loading

eggfoobar commented Nov 23, 2021 •

edited

Loading